Attention in Attention: Modeling Context Correlation for Efficient Video Classification

نویسندگان

چکیده

Attention mechanisms have significantly boosted the performance of video classification neural networks thanks to utilization perspective contexts. However, current research on attention generally focuses adopting a specific aspect contexts (e.g., channel, spatial/temporal, or global context) refine features and neglects their underlying correlation when computing attentions. This leads incomplete context hence bears weakness limited improvement. To tackle problem, this paper proposes an efficient attention-in-attention (AIA) method for element-wise feature refinement, which investigates feasibility inserting channel into spatio-temporal learning module, referred as CinST, also its reverse variant, STinC. Specifically, we instantiate dynamics aggregated along axis with average max pooling operations. The workflow AIA module is that first block uses one kind information guide gating weights calculation second targets at other context. Moreover, all computational operations in units act pooled dimension, results quite few cost increase (< 0.02%). verify our method, densely integrate it two classical network backbones conduct extensive experiments several standard benchmarks. source code available https://github.com/haoyanbin918/Attention-in-Attention .

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification

Recently, substantial research effort has focused on how to apply CNNs or RNNs to better extract temporal patterns from videos, so as to improve the accuracy of video classification. In this paper, however, we show that temporal information, especially longer-term patterns, may not be necessary to achieve competitive results on common video classification datasets. We investigate the potential ...

متن کامل

Advancing Connectionist Temporal Classification With Attention Modeling

In this study, we propose advancing all-neural speech recognition by directly incorporating attention modeling within the Connectionist Temporal Classification (CTC) framework. In particular, we derive new context vectors using time convolution features to model attention as part of the CTC network. To further improve attention modeling, we utilize content information extracted from a network r...

متن کامل

On the Use of Spatiotemporal Visual Attention for Video Classification

It is common sense among experts that visual attention plays an important role in perception, being necessary for obtaining salient information about the surroundings. It may be the “glue” that binds simple visual features into an object [1]. Having proposed a spatiotemporal model for visual attention in the past, we elaborate on this work and use it for video classification. Our claim is that ...

متن کامل

A spatiotemporal model with visual attention for video classification

High level understanding of sequential visual input is important for safe and stable autonomy, especially in localization and object detection. While traditional object classification and tracking approaches are specifically designed to handle variations in rotation and scale, current state-of-the-art approaches based on deep learning achieve better performance. This paper focuses on developing...

متن کامل

Social Attention: Modeling Attention in Human Crowds

Robots that navigate through human crowds need to be able to plan safe, efficient, and human predictable trajectories. This is a particularly challenging problem as it requires the robot to predict future human trajectories within a crowd where everyone implicitly cooperates with each other to avoid collisions. Previous approaches to human trajectory prediction have modeled the interactions bet...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Circuits and Systems for Video Technology

سال: 2022

ISSN: ['1051-8215', '1558-2205']

DOI: https://doi.org/10.1109/tcsvt.2022.3169842